Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator support prefetch #9679

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

guo-shaoge
Copy link
Contributor

@guo-shaoge guo-shaoge commented Nov 28, 2024

What problem does this PR solve?

Issue Number: close #9680

Problem Summary:

What is changed and how it works?

  1. Support prefetch for HashTable and StringHashTable
    1. For StringHashTable, emplace will touch the sepcific submap instead of using StringHashMap method. Because it's easier to implement prefetch and good for performance.

Benchmark

workloads: TPCH-50G
queries:

-- Q1-1: key_int64; distinct rate: 10M/300M; HashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_partkey from lineitem group by l_partkey limit 1 offset 9000000;
-- Q1-2: key_int64; 75M/300M; HashMp
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_orderkey from lineitem group by  L_orderkey limit 1 offset 70000000;
-- Q1-3: key_int64; 7/300M; HashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_linenumber from lineitem group by  l_linenumber;

-- Q2-1: one_key_strbinpadding_phmap; 2/300M; StringHashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_linestatus from lineitem group by l_linestatus;
-- Q2-2: one_key_strbinpadding; 104M/300M; StringHashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount) from lineitem group by l_comment limit 1 offset 100000000;


-- Q3-1: key_serialized as group by method;  33/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_returnflag from lineitem group by l_returnflag, l_discount;
-- Q3-2: key_serialized as group by method; 77M/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_returnflag from lineitem group by l_returnflag, l_discount, l_extendedprice limit 1 offset 75000000;


-- Q4-1: two_keys_num64_strbinpadding: 21/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount) from lineitem group by l_returnflag, L_LINENUMBER;
-- Q4-2: two_keys_num64_strbinpadding; 29.9M/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_partkey from lineitem group by l_returnflag, l_partkey limit 1 offset 29000000;


-- Q5-1: keys_128; 77/300M; HashMap with UInt128 key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_linenumber), l_discount from lineitem group by l_linenumber, l_discount;
-- Q5-2: keys_128; 5M/300M; HashMap with UInt128 key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_linenumber), l_discount from lineitem group by l_suppkey, l_discount limit 1 offset 4000000;


-- Q6-1: key_string; 4/300M; StringHashMap
 explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_shipinstruct from lineitem group by l_shipinstruct;
 
 -- Q7-1: keys_256; 290M/300M; HashMap;
 explain analyze select /*+ mpp_1phase_agg() */ sum(l_suppkey) from lineitem group by l_suppkey, l_tax, l_discount, l_partkey limit 1;

Results:

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Nov 28, 2024
Copy link
Contributor

ti-chi-bot bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from guo-shaoge, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Nov 28, 2024
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge changed the title Aggregator support prefetch Aggregator support prefetch and new hasher Dec 2, 2024
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge
Copy link
Contributor Author

/retest

Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
This reverts commit 3e30f95.
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge changed the title Aggregator support prefetch and new hasher Aggregator support prefetch Dec 4, 2024
Signed-off-by: guo-shaoge <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Dec 5, 2024

@guo-shaoge: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-test 352b710 link true /test pull-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

template <bool collect_hit_rate, bool only_lookup, typename Method>
ALWAYS_INLINE void Aggregator::executeImplBatch(
// This is only used by executeImplStringHashMapByCol.
// It will choose specifix submap of StringHashMap then do emplace/find.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It will choose specifix submap of StringHashMap then do emplace/find.
// It will choose specific submap of StringHashMap then do emplace/find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregator support prefetch
2 participants